Word Segmentation in Sanskrit Using Path Constrained Random Walks
نویسندگان
چکیده
In Sanskrit, the phonemes at the word boundaries undergo changes to form new phonemes through a process called as sandhi. A fused sentence can be segmented into multiple possible segmentations. We propose a word segmentation approach that predicts the most semantically valid segmentation for a given sentence. We treat the problem as a query expansion problem and use the path-constrained random walks framework to predict the correct segments.
منابع مشابه
Semi-Supervised Chinese Word Segmentation Using Partial-Label Learning With Conditional Random Fields
There is rich knowledge encoded in online web data. For example, punctuation and entity tags in Wikipedia data define some word boundaries in a sentence. In this paper we adopt partial-label learning with conditional random fields to make use of this valuable knowledge for semi-supervised Chinese word segmentation. The basic idea of partial-label learning is to optimize a cost function that mar...
متن کاملAutomatic Sanskrit Segmentizer Using Finite State Transducers
In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. We followed two different approaches to segment a Sanskrit text using sandhi1 ...
متن کاملDesign of a lean interface for Sanskrit corpus annotation
We describe an innovative computer interface designed for assisting annotators in the efficient selection of segmentation solutions for proper tagging of Sanskrit corpus. The proposed solution uses a compact representation of the shared forest of all segmentations. The main idea is to represent the union of all segmentations, abstracting on the sandhi rules used, and aligning on the input sente...
متن کاملSPARSE: Seed Point Auto‐Generation for Random Walks Segmentation Enhancement in medical inhomogeneous targets delineation of morphological MR and CT images
In medical image processing, robust segmentation of inhomogeneous targets is a challenging problem. Because of the complexity and diversity in medical images, the commonly used semiautomatic segmentation algorithms usually fail in the segmentation of inhomogeneous objects. In this study, we propose a novel algorithm imbedded with a seed point autogeneration for random walks segmentation enhance...
متن کاملDual Cavity Segmentation of Left and Right Ventricles in Cardiac MRI by Guided Random Walks with Registration
In this paper we propose a new method for accurate segmentation of the left and right ventricles simultaneously in cardiac magnetic resonance images. Our approach is based on guided random walks and registration in order to efficiently exploit the prior shape knowledge. The contribution of the proposed method is in using registration of the pre-segmented data and then guided random walks segmen...
متن کامل